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Abstract. We describe a new optimization scheme for finding high- 
quality clusterings in planar graphs that uses weighted perfect matching 
as a subroutine. Our method provides lower-bounds on the energy of 
the optimal correlation clustering that are typically fast to compute and 
tight in practice. We demonstrate our algorithm on the problem of image 
segmentation where this approach outperforms existing global optimiza- 
tion techniques in minimizing the objective and is competitive with the 
state of the art in producing high-quality segmentations. Q 

1 Introduction 

We tackle the problem of generic image segmentation where the goal is to par- 
tition the pixels of an image into sets corresponding to objects and surfaces in 
a scene. Cues for this task can come from both bottom- up (e.g., local edge con- 
trast) and top-down (e.g., recognition of familiar objects). For closed domains 
where top-down information is available, this problem can be phrased in terms of 
labeling each pixel with one of several category labels or perhaps "background" . 
There is a rapidly developing body of research in this area that integrates mul- 
tiple cues such as the output of a bank of object detectors into a single model, 
typically formulated as a Markov random field over the pixel labels and some 
additional hidden variables [1I2I3I4I5I6I7) . 

When top-down information is not available, it may still be quite valuable to 
estimate image segments. Bottom-up segmentations provide candidate support 
for novel objects and can simplify the processing of the scene to the problem 
of understanding a small number of salient regions. Without a predefined set of 
labels, it is natural to describe the segmentation task as a graph partitioning 
problem in which pixels or superpixels have pairwise or higher order similarities 
and the number of parts must be estimated. There is a rich history of applying 
graph partitioning techniques to image segmentation (e.g., 18191101111121 1. 

Hero we consider the weighted correlation clustering objective which sums 
up the edges cut by a proposed partitioning of the graph. Edges may have 
both positive and negative weights. Correlation clustering is appealing since the 
optimal number of segments emerges naturally as a function of the edge weights 

1 This is the extended version of a paper to appear at the 12th European Conference 
on Computer Vision (ECCV 2012) 
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rather than requiring an additional search over some model order parameter. 
Further, because the objective is linear in the edge weights, the problem of 
learning can be approached using techniques from structured prediction 13 . 

As with many non-trivial graph partitioning criteria, finding a minimum 
weight correlation clustering is NP-hard for general graphs [Uj. Demaine et 
al. [15] provide results on the hardness of approximation in general graphs by 
reduction to/from multiway cut [16]. Recently, Bachrach et al. [17] also showed 
that correlation clustering is NP-hard in planar graphs by a reduction from 
planar independent set. 

Despite these difficulties, correlation clustering has seen a few recent appli- 
cations to the image segmentation problem. Andres et al. [18] define a model 
for image segmentation that scores segmentations based on the sum of costs 
associated with each edge in the segmentation and optimize it using an integer 
linear programming (ILP) branch-and-cut strategy. Kim et al. [TH] use a corre- 
lation clustering model for segmentation which includes higher-order potentials 
or hyper-edges that define cost over sets of nodes which they solve using linear 
programming (LP) relaxation techniques. 

In this paper, we describe a new optimization strategy that specifically ex- 
ploits the planar structure of the image graph. Our approach uses weighted 
perfect matching to find candidate cuts in re-weighted versions of the original 
graph and then combines these cuts into a final clustering. The collection of cuts 
form constraints in a linear program that lower-bounds the energy of the true 
correlation clustering. In practice this lower-bound and the cost of the output 
clustering are almost always equal, yielding a certificate of global optimality. 
We compare this new optimization scheme to existing approaches based on both 
standard LP relaxations and ILP and find that our approach is substantially 
faster and provides tighter lower-bounds for a wide range of image segmentation 
problems. 

2 Correlation Clustering 

Correlation clustering is a clustering criteria based on pairwise (dis) similarities. 
Let G = (V, E) be an undirected graph with edge weights 6 e £ I that specify 
the similarity or dissimilarity on an edge e = between vertices i and j. 

Correlation clustering seeks a clustering of the vertices into disjoint sets V = 
V\ U V2 U V3 . . . that minimizes the total weight of edges between clusters. ^\ 

Let X e be a binary indicator variable specifying which edges are "cut" by 
the partitioning. X e = if edge e = (it, v) is within a cluster (i.e., u, v € Vi) and 
X e = 1 if e runs between two clusters (i.e., u € VI, v € Vj, i =/= j). Let C indicate 
the configurations of X that correspond to valid partitionings of the vertices. 
We can describe this succinctly by the set of triangle inequalities 

C = {X : X > X UiV Vu, v, w £ V} 

2 This objective is equivalent (up to a constant) with the minimum-disagreement or 
maximum-agreement objectives mentioned in the literature [14115] . 
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These constraints enforce transitivity of the clustering; if edge (u, w) is cut, then 
at least one of (u,uu) and (w,v) must also be cut. 

We can express the correlation clustering problem as: 

CC* = min V 9 e X e 

e£E 

and refer to CC* as the cost or the energy of the optimal clustering. Where 
appropriate, we will use CC*(9) to indicate the dependence of this optimum on 
the parameters. 

Unlike other graph partitioning objectives (min-cut, normalized-cut, etc.) 
the edge weights can be both positive and negative. Furthermore, we do not 
specify the number of segments a priori or place any constraints on their size. 
Instead, these arise naturally from the edge weights. For example, if all the edge 
weights are negative, each vertex will be placed in a separate cluster. If all the 
edge weights are positive, the optimal solution is to place all vertices in a single 
cluster. This means that CC* is upper-bounded by since placing all the vertices 
in the same cluster is a valid partitioning with cost 0. 

The correlation clustering objective appears very similar to standard pairwise 
Markov Random Field (MRF) models for image labeling. For example, if we 
knew the optimal solution consisted of k clusters, we could convert the problem 
into a fc-state MRF without any unary terms. In the next section we make this 
connection precise. 



3 Clusterings and Colorings 

Consider a partitioning of the graph represented by X £ C. We call this parti- 
tioning k-colorable if there is some labeling L : V — > 1, 2, . . . , k of the vertices of 
the graph so that X uv = 1 L(u) ^ L(v). For every graph, there is a minimal 
number of colors 7(G), known as the chromatic number of G, that is sufficient to 
represent all partitions. Let Ck be the set of partitioning^ that are representable 
by k colors, then C\ C C2 C . . . C C 7 (g) = C- For example, the four-color theo- 
rem [20J shows that any partition of a planar graph can be represented by k = 4 
labels so C = C4 for planar graphs. 

This provides a useful alternative formulation of correlation clustering in 
terms of vertex labels. Let L v £ 1, . . . , k be a label variable for vertex v. Then 
we can define an equivalent optimization problem 

CC* = min V 6 U JL U ^ L v ] 

To produce a partitioning from the labeling, simply take the collection of con- 
nected components for the subgraph induced by each label in turn. If G is k- 
colorable, then CC* — CC* for any q > k. In general CC* > CC* for q < k 
since the optimal partitioning of G may not be g-colorable. The set of 2-colorable 
partitions are commonly referred to as cuts of a graph. 
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Since planar graphs are 4-colorable, CC* = CC\. We could tackle the prob- 
lem planar correlation clustering using the standard set of tools for optimizing a 
4-state MRF with mixed (attractive/repulsive) potentials. Since the combinato- 
rial optimization is NP hard, such methods can only give approximate solutions. 
Furthermore, many of them perform poorly on problems with no unary poten- 
tials. Since the energy function is symmetric with respect to permutations of 
the labels, the true max-marginals for the node labels are uninformative and 
one is forced to look at higher-order constraints. For example, the lower-bound 
provided by TRW |21l22j is simply the sum of the negative edge weights in the 
graph. 

One interesting exception is the case of planar binary labeling problems. For 
planar graphs, the cost of the optimal binary labeling CC\ can be computed 
by an efficient reduction to weighted perfect matching in a suitably augmented 
planar dual of the graph G. This idea was first described in the statistical physics 
literature by Kasteleyn [23] and Fisher (24] in the context of computing the 
partition function of an Ising model. Recently, this has been explored as a tool 
for finding MAP configurations of more general MRFs that include an external 
field |25l26l27l28l29j . 

Since 2-colorable partitions are a subset of 4-colorable partitions, finding 
the optimal 2-colorable partitioning does not necessarily give us the optimal 
clustering of a planar graph. The space of 4-colorable partitions is larger so in 
general CC^ > CC%. However, the optimal 2-coloring still provides some useful 
information about the optimal 4-colorable partition. 

Proposition 1. For any graph, > CCJ > CC\ > §CC£ so the cost of the 
optimal planar correlation clustering is bounded below by | the cost of the optimal 
2-colorable partition. 

Proof. For a partitioning described by some labeling L, let 

S(a,b)= ^2 6u,v 

u:L u —a 

denote the sum of weights of edges between vertices labeled a and those labeled 
b. Take the 4-colorable partition whose cost is CC\ = J2 a <b *^( a ' an d consider 
the 2-colorable partitions in which pairs of labels from the 4-coloring are merged. 
There are three such 2-colorings, each with the following costs 



E a = 5(1, 3) + 5(1, 4) + 5(2, 3) + 5(2, 4) (1) 
E b = 5(1, 2) + 5(1, 4) + 5(3, 2) + S(3, 4) (2) 
E c = S{1, 2) + 5(1, 3) + 5(4, 2) + 5(4, 3) (3) 
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Summing these costs includes every possible S term twice so we have 



CCt = ~(E a + E b + E c ) 
> ^min{E a ,E b ,E c } 



(4) 



(5) 




(6) 



The first inequality follows since one of the three terms in the sum must be at 
least as small as 1 /3 the total. The second inequality follows since none of these 
2-colorings can have lower cost than the optimum 2-coloring. The same approach 
can be used to relate any pair of CC^ and CC*. 

Corollary 1. If CC* 2 = then CC\ = 0. 

Since the 2-colorable clustering is only off by a constant factor and provides 
a very efficient solution for finding approximate correlation clusterings, it seems 
a likely candidate for segmentation. However, in practice, it performs poorly for 
real image segmentation problems. In natural images, T-junctions where three 
different image segments come together are common, and such a junction cannot 
be 2-colored! In the next section, we devise a tighter bound which uses the 2- 
coloring as a subroutine. 



4 Lower-bounding Planar Correlation Clustering 

Dual-decomposition provides a very general framework for tackling difficult prob- 
lems by splitting them into a collection of tractable sub-problems which are 
solved independently subject to the constraint that they agree on their solu- 
tions. This constraint is enforced in a soft way using Lagrange multipliers, which 
results in a dual solution that lower-bounds the original minimization problem. 
Decomposition techniques have been studied in the optimization community for 
decades. Dual-decomposition was used by Wainwright et al. [3T] to derive algo- 
rithms for inference in graphical models and has become increasingly popular in 
the computer vision literature recently due to its flexibility [30 . 

We consider bounding the planar correlation clustering by a decomposition 
into two sub-problems, an easier partitioning problem and an independent edge 
problem that does not enforce the clustering constraints. To make the partition- 
ing problem tractable, we impose a constraint on the decomposition so that the 
cost of the optimal clustering can be computed. Recall our notation that CC*(A) 
is the optimal correlation clustering cost associated with edge weights A. Let the 
set ft = {A : CC*(A) = 0} be those those edges weights for which the optimal 
clustering has zero cost. We can then write the following decomposition bound: 
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CC* = min V 9 e X e 



> max 



e£E 




(7) 
(8) 
(9) 



max min > (0 e — \ e )X e 
AG/2 I x 

\ e<EE , 

max min{(# e — A e ), 0} 



(10) 

(11) 



In equation pi we have decomposed the original edge weights 9 across two 
sub-problems. The first is a correlation clustering problem (identical in form 
to our original problem) while the second one independently optimizes over all 
the edges (with no constraints on X). For any choice of A, these two objectives 
sum up to the original problem. Since the configurations (X,X) in each sub- 
problem are optimized independently, the sum of their energies may produce a 
lower-bound for arbitrary A but the bound can be made tight (setting A e = 8 e 
recovers the original objective). In equation ([9]), we restrict the domain of A to 
those settings for which the clustering sub-problem has an optimum of zero. The 
inequality arises since we are maximizing the bound over a more restrictive set. 



Finally, in equation ( 10 ) we have simplified the expression since the constraint on 



A entails that the first term is exactly zero and X can be optimized independently 
for each edge. 



5 Bound Optimization using Linear Programming 

Lagrangian relaxation approaches typically use projected sub-gradient ascent or 
other non-smooth optimization techniques to tackle objectives like that shown 
in equation Q. Here it is difficult to compute the required (sub)gradient in- 
formation since, for a given setting of A, there isn't an obvious way to recover 
the full set of optimizing solutions for X beyond the trivial solution X e = 0. 
The constraint set fl also appears quite complicated. However, we do have an 
efficient method for testing membership in fl. By our earlier proposition, 

Q = {A : CC*(\) = 0} = {A : C*C 2 *(A) = 0} (12) 

= {A : X e X e ^ yX G C 2} (13) 

e 

This expression highlights that fl is a polytope defined by a set of linear inequal- 
ities. For a given A, we can test membership and, if A ^ £2, produce a violated 
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constraint described by a negative weight 2-colorable clustering. This provides a 



method to solve equation (11) using cutting planes to successively approximate 
the constraint set Q. 

We say that an edge e is constrained by Q for a given setting of A if there 
exists some cut in X e C2 with X e — 1 and for which ^ e \ e X e — 0. If an edge 
e is unconstrained, then we can decrease A e and thereby increase the bound 
until it becomes constrained or until it is no longer cut in the independent edge 
problem. 

To simplify the bound optimization, we first consider some additional con- 
straints on A. When maximizing the bound over A e , it is always the case that 
there is an optimal A e > 6 e , Choosing A e < 9 e gives the edge e positive weight 
in the independent edge problem so it does nothing to increase the objective. 
Further, any amount by which A e is less than 6 e can only make the constraints 
fl more difficult to satisfy. Therefore, we are free to only consider A for which 
(9 e — A e ) < without impacting the final bound. This simplifies the expression 



for the objective in equation ( 11 ) by removing the min. 

We also impose upper bounds on A. For edges with e < we add the 
constraint that A e < 0. For edges with 8 e > we impose the constraint that 
A e = 8 e . These constraints are sensible in that they are coordinate- wise optimal 
(e.g., increasing A e above 9 e > decreases the bound). In Appendix B, we 
show that any optimal A can be deformed to one which satisfies these additional 
constraints without loosening the bound. In practice these additional constraints 
make the bound optimization far more efficient. 

We can now write the bound optimization problem with these additional 
constraints explicitly as standard linear program: 

max^(0 e -A e ) (14) 

e 

s.t. 6 e < A e < max{0, 9 e } 

x eX e > yx eC 2 



This LP has an exponential number of constraints, one for every possible 
2-colorable partition X. To solve this LP efficiently, we use a cutting plane 
approach to successively add violated constraints to a collection. Our final algo- 
rithm for bound optimization is given in Figure [l] 

In our actual implementation, we perform one additional step. Each new 
constraint X may partition the graph into multiple components. We break the 
cut X up into the set of basic cuts, each of which isolates a component. We add 
this collection of constraints as a batch. With this modification, we find that in 
practice very few batches of constraints (typically 5-10) are necessary in order 
to produce a solution to the full linear program. 
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Lower-bound optimization 

V = % 

while CCi(X) < do 

X = argminxGC 2 E eeB ^X e 
P = PuX 



Solve ( 14 \ with partial constraint 
set V CC 2 
end while 



Upper-bound decoding 

8 = 

£ = {e : e - A e < 0} 
for e £ £ do 

X = argminxee 2 ,x e =i E/ 6 ,e X f X f 
S' = max(S, X) 
if £ e 0.# <E e 0eS e then 
S = S' 

Ve : X e = 1, \e = 
end if 
end for 



Fig. 1. (left) Cutting plane algorithm for computing the optimal lower-bound by suc- 
cessively adding constraints, (right) Upper-bound decoding by recursive partitioning 



6 Decoding upper-bounds 
6.1 Recursive Bipartitioning 

Once we have optimized the lower-bound, we would like to find a corresponding 
low-cost clustering. In general, there will be some edges for which (8 e — A e ) < 0. 
In order for the bound to be tight, we need to find a clustering in which these 
edges are cut. As noted in the previous section, every such "must cut" edge 
is constrained by some cut X that includes that edge. Although none of the 
individual minimal cuts X may agree with all of the "must cut" edges in the 
independent sub-problem (second term in equation Q), there is some minimal 
cut that agrees with each one. 

Motivated by this intuition, one can use the following decoding technique. 
Start with the original clustering sub-problem which has edge weights A. Choose 
an ordering of those edges e for which (8 e — A e ) < 0. For each of these "must 
cut" edges in turn, find a zero weight cut X € C2 for the clustering subproblem 
and add it to the final partitioning as long as it decreases the original objective. 
Remove these cut edges from the graph and continue on with the next edge. The 
pseudo-code is displayed in Figure [T] 



6.2 Dual LP Rounding 



An alternative approach is to consider the dual LP to equation (14). Let C be 
a matrix whose rows contain the indicator vectors for cuts X € C. Define the 
convex cone = {C T a,a > 0} which is known as the "cut cone" [3T]. It is 
straightforward to see that the set of valid partitions lives inside the cut cone 
(C C C 2 ). Given a valid partition indicator vector X, we can write it as a linear 
combination of cuts, where each cut isolates an individual segment and the cuts 
are assigned a weight of on = 0.5. 
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The dual LP to our lower-bound is given by 

mme T z - min(6> T ,0)max(z - 1,0) (15) 

z 

s.t. zeC 2 A 

The first term in the objective is exactly our original correlation clustering ob- 
jective where the binary indicator X has been replaced by real valued z. The 
second term in the objective arises from the upper-bound constraints imposed 
on A and effectively cancels out the benefit of cutting any negative weight edge 
by an amount of more than one (see Appendix A). To compute a solution to the 



dual, we solve ( 15 ) using a matrix C that contains only those cut vectors in V 
produced during the lower-bound optimization. The resulting solution vector z 
is thresholded to produce a final segmentation. 



Taking the dual of equation ( 14 ) without the upper-bound constraints on A 
yields a nice interpretation of our algorithm as a convex relaxation of the original 
discrete clustering problem in which the convex hull of C is approximated by the 
intersection of the cut cone and the unit cube: 

min6» T 2 s.t. zeC 2 A , 0<z<l (16) 

z 

Since the cut cone for planar graphs can be described with a polynomial number 



of constraints [32], one could directly solve the dual LP in equation (16). Our 
bound optimization gains considerable efficiency by not using the full set of cuts 
C . Instead, a small number of cutting planes (in the original LP) provides a 
delayed column generation scheme for solving the dual LP. 



When only using a subset of cuts, the second term in ( 15 ) and the corre- 
sponding constraints in the primal LP are necessary since the bound can be 
tight without the optimal partition vector living in the subspace of the cut cone 
described by V . Allowing solutions with z > 1 lets us access a larger set parti- 
tions without increasing the dimensionality of the subspace. 



7 Experiments 

We demonstrate the performance of our algorithm on correlation clustering prob- 
lem instances from the Berkeley Segmentation Data set [33 34 . Our clustering 
problem is defined on the superpixel graph given by performing the oriented wa- 
tershed transform (owt) on the output of the "generalized probability of bound- 
ary" (gPb) boundary detector output as proposed by [31] . Each pair of super- 
pixels that are adjacent in the image are connected by an edge whose weight is 
given by 



where gPb e is the average gPb along the edge and j3 is a threshold parameter 
that modulates the number of segments in the optimal clustering. Large /3 results 
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Fig. 2. Comparison of bound optimization on image segmentation problems. Each 
graph shows the distribution results over 200 problem instances at four different thresh- 
old settings ranging from coarse (/3 = 0.35) to fine (/3 = 0.12). The left column shows 
the difference in the lower-bounds returned by the PlanarCC bound and MPLP using 
the set of face cycle constraints. Our code returned tight bounds in all but one in- 
stance while the LP relaxation typically gave looser bounds. The right column shows 
the running times of our approach compared to the ILP branch-and-cut advocated by 
[18] . Here we plot relative speedup factors on a logarithmic scale. We find that the 
PlanarCC bound computation and decoding produces the same global optima as the 
ILP approach much faster. 



in more positive edges and hence coarser optimal segmentations. To compare dif- 
ferent optimizers, we use 4 different settings of j3 = {0.35, 0.27, 0.20, 0.12} which 
produces segmentation outputs that cover a range of granularities. Parameters 
6 were rounded to 5 decimal places in order to simplify tests of convergence. 

We implemented our optimization using the BlossomV minimum weight per- 
fect matching code 35 36J and IBM's CPLEX solver to optimize the lower- 
bound. We used a tolerance of — 10~ 6 as a stopping criterion for adding additional 
constraints to the lower-bound LP. We found that both decoding schemes work 
well. In the experiments described here, we computed up to ten upper bounds 
using the recursive bipartitioning procedure, each run using a random order for 
adding contours. This process terminated early if the lower- and upper-bounds 
were equal. 

For a baseline lower-bounding scheme, we used max-product linear program- 
ming (MPLP) [37] which efficiently solves an LP relaxation of the original clus- 
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tering objective. To represent the set of clusterings in terms of node labels, each 
superpixel takes on one of 4 states and pairwise potentials encode the boundary 
strength between neighboring superpixels. As mentioned before, the standard 
edge-based relaxation is uninformative when unary potentials are absent so we 
include the set of cycle constraints given by collection of cycles that bound the 
planar faces of the superpixel graph. This is not sufficient to enforce consistency 
over all cycles in the graph but is a natural choice commonly used in the litera- 
ture. In our experiments, we used a fast, in-house implementation of MPLP. 

We also implemented the branch-and-cut ILP technique proposed by [18] us- 
ing the CPLEX ILP solver. This approach finds an integral solution of the corre- 
lation clustering objective, removes cut edges specified in the solution and then 
produces a partition by finding connected components of the resulting graph. It 
then searches for inconsistent edges, namely cut edges that connect two nodes 
that lie within the same connected component. If any such edges are found, a 
constraint is added to enforce consistency of that edge and the ILP is re-solved. 

7.1 Bound optimization experiments 

Figure [2] shows a comparison of the lower-bounds generated by MPLP compared 
to that generated by PlanarCC. We found that the time needed for MPLP to 
solve each problem is comparable to that of PlanarCC. However the differences 
in the lower-bound are significant. With only the set of face cycles, MPLP is 
seldom able to produce a tight lower-bound. In contrast, the PlanarCC approach 
typically gives tight bounds with only 5-10 batches of cut constraints. 

We found that the upper-bounds (solutions) generated by ILP and PlanarCC 
are very similar and very close to optimal so we compare the time consumed by 
each algorithm as a function of In Figure [5Ja) we show histogram of the 
comparative run times, \og w (T IL p/T PlanarCC ). 

Note that the relative performance of PlanarCC improves as we move from a 
high detail segmentation /3 = 0.12 to a coarse segmentation /3 = 0.35. For coarse 
segmentations (large /?) the optimal solution contains many long contours and 
PlanarCC performs well relative to ILP, whereas for detailed segmentations ILP 
will tend to do more favorably. For example, in the limit where all the edges have 
negative weight, the ILP approach or LP relaxation gives the correct answer (cut 
all edges) without the need for any constraints. However on average, we find that 
the PlanarCC approach performs favorably across a range of useful thresholds 
on the BSDS images, giving speedups that range from 10 to lOOOx. 

7.2 Segmentation performance 

We benchmark the quality of the segmentations produced by correlation clus- 
tering for a range of thresholds f3 on the BSDS500 test set. We use the same 
superpixels and local cues as the top performing gPb+owt+UCM algorithm of 
Arbelaez et al. [34]. Figure [3] shows the benchmark results of our algorithm and 
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Fig. 3. Evaluation on the BSDS500 segmentation boundary benchmark. We compare 
segmentation performance to the state of the art technique (gPb+owt+UCM) proposed 
by [34]. We use the same set of superpixels and contour cues derived from (gPb+owt). 
We compare to two different variants of the UCM algorithm based on region merging. 
UCM performs a length weighted average of the gPb along contours after each merge 
while UCM-L performs a uniform average. We find that the globally optimal corre- 
lation clustering returned by our algorithm performs slightly better than the uniform 
averaging version of UCM but the length- weighted UCM gives better final performance. 



two variants of the UCM algorithm. As visible in the figure, our algorithm per- 
forms comparably to UCM and performs slightly better than the results of [15] 
who report an F-measure of 0.70. 

The UCM algorithm is a region merging algorithm that successively merges 
the two superpixels that have the lowest energy edge between them. Since this 
algorithm is "greedy" with respect to the clustering objective, we would expect 
that it would occasionally merge two segments due to some small break in the 
contour contrast, a fate that our global optimization approach could avoid. How- 
ever, as is clear from Figure |3j in practice the greedy nature of UCM does not 
seem to significantly hurt overall performance. 

One explanation is that the UCM algorithm modifies the edge costs as it 
proceeds. After each merging step, any new contours that have been formed are 
re-assigned the average of the underlying gPb. Our global clustering objective 
cannot capture this length weighted averaging. Figure [3] shows performance of 
the UCM algorithm with length- weighted averaging (UCM) and simple averaging 
(UCM-L). While our approach outperforms the non- length weighted version, the 
differences are not substantial. 



Fast Planar Correlation Clustering 



13 



P=0-35 



0.02 0.04 0.06 O.C 

P=0.2 



0.02 0.04 



P=0.27 




0.02 0.04 0.06 0.08 0.1 

Energy(UCM)-Energy(Planar CC)) 



Fig. 4. We find that our algorithm returns lower-energy segmentations than the UCM 
algorithm. This suggests either a mismatch between the correlation clustering model 
and the ground-truth or that our model is using suboptimal settings for the local 
boundary cues 



Another possible explanation is that the greedy merging is truly successful in 
optimizing the correlation clustering objective. Figure[4]shows that this is not the 
case - while there is usually some UCM threshold that provides a segmentation 
with a fairly low-cost clustering, it is still suboptimal compared to the solutions 
returned by PlanarCC. This suggests learning an optimal cue combination via 
structured prediction may improve performance. 

Finally, it is worth noting that the boundary detection benchmark does not 
provide strong penalties for small leaks between two segments when the total 
number of boundary pixels involved is small. We found that on the region based 
benchmarks, PlanarCC did outperform UCM slightly when the optimal segmen- 
tation threshold was chosen on a per-image basis (GT Covering OIS 0.65 versus 
0.64 for UCM). We expect these differences may become more apparent in an 
application where the local boundary signal is noisier (e.g., biological imaging) 
or when there is a greater cost for under-segmentation. 



8 Conclusion 



We have presented a novel, fast algorithm for finding high quality correlation 
clusterings in planar graphs. Our algorithm appears to outperform existing ap- 
proaches on a variety of real problem instances. Our method exploits decom- 
position into subproblems that lack efficient combinatorial algorithms but are 
still tractable in the sense having efficient oracles. This offers a new technique 
in the toolkit of Lagrangian relaxations that we expect will find further use in 
the application of dual-decomposition to vision problems. 
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Appendix A: Derivation of the LP dual 



It is informative to analyze the dual LP to the bound optimization presented in 
Equation (14) of the paper: 

rnax^(0 e -A e ) (17) 

e 

s.t. 9 e < A e < max{0, 9 e } 

x eX e > o yx e C 2 

e 

In order to write this in a standard form, define 

A e = A e — 9 e 



and 

6~ e = max(0, 6 e ) -d e = - min(0, 9 e ) 

Let C be a matrix whose rows are the collection of cut indicator vectors X. Let 
us assume for now that C contains the entire set of cut vectors. We can write 
the LP in standard form: 

max-l T A (18) 

s.t. A > 

A < 9 

-c\<ce 

which has the following dual LP: 

mm9 T C T a + 9 T l3 (19) 
s.t. a > 



(3 > max(0, C 1 a - 1) 



To further simplify the expression, we define the set C 2 = {C T a, a > 0} which 
is the convex cone known as the "cut cone" [31]. Since 9 > 0, j3 will always take 
on its minimum allowable value so we can collapse it into the objective, yielding: 

min6> T z-min(6> T ,0)max(z- 1,0) (20) 

z 

s.t. z G C2 

Observe that the second term in the objective is when z < 1 and is positive 
when z > 1. 
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If we drop the upper-bound constraints on A in Equation (14) in the main 
paper, we get the simplified dual LP: 

min T z (21) 

z 

s.t. z G C2 

z < 1 

As discussed in the paper, this analysis provides several insights on the nature 
of the lower-bound: 

1. One can see that geometrically, the PlanarCC bound is equivalent to opti- 
mizing over a relaxation of the multi-cut polytope given by the intersection 
of the cut cone and the unit hypercube. This seems natural since the cut 
cone and the multi-cut cone coincide [3T] and the cut cone can be compactly 
described for planar graphs. 

2. Intuitively, the upper-bound constraints on A in the original LP are irrele- 
vant to the final value of the LP in the case that all cuts are included in C. 
This is because any multicut can be represented as a sum of isolating cuts, 
each with weight 0.5. If such a solution optimizes LP (5), it is also optimal 
for LP (4). We give a detailed proof in the next section which holds even 
when C only includes a subset of cuts. 

3. Finally, one can see the relation between this bound and the standard cutting 
plane approaches used, for example, in [38] or the ILP solution of [18]. In a 
standard cutting-plane approach, one optimizes the LP relaxation mm8 T z 
with a subset of the constraints that define the multi-cut polytope and then 
successively add constraints, carving away parts of the search space until an 
integral solution is found. In contrast, each time we add a cutting plane to 
our primal LP, this adds another row to the matrix C in the dual LP which 
expands the set of allowable solutions z. Thus our algorithm can be viewed 
as a delayed column generation scheme for the dual LP in which we keep 
growing the space of reachable z until an optimum is found. 

Appendix B: Additional constraints on A don't affect the 
bound 

In Section 5 we introduced the constraint A e < max{0, 9 e } without a formal 
justification. Here we show this constraint does not decrease the lower-bound. 
Suppose we first optimize the lower-bound without including the constraint A e < 
max{0, 9 e }. Let A* denote the optimizing parameters. Our strategy is to show 
that A* can be modified to satisfy A e < max{0, 9 e } Ve without loosening the 
bound. 

We first restate the definition of the lower-bound: 

CC* > J2 % - A: ) + min J2 KXe (22) 
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The convex hull of the set of cut indicator vectors is known as the cut poly- 
tope which we denote Cip • For planar graphs, this polytope is compactly 
described by the set of cycle inequalities [33]. One can relax the discrete opti- 
mization over C2 to an LP over the cut polytope (see e.g., [38). In the following 
analysis, we work with the dual of this relaxed LP in which we have a collection 
of dual variables {4>^} corresponding to the constraint on each cycle c of the 
planar graph [40] . 

min V X*X e = max V min V 6lX c e (23) 

The right-hand side corresponds to dual-decomposition into a collection of sub- 
problems, each of which is a cycle from the original graph. Let (j)* denote an 
optimizer of this dual LP. We write our lower-bound in terms of this cycle de- 
composition as: 

CC* > (0e - K ) + ]T min £ ^X c e (24) 

e c e 

Lemma 1: For all cycles c, minx<=ec 2 S e 4>% c ^t = 0- 

Lemma 1 holds because the minimum energy of each subproblem for a cycle 
c is upper-bounded by zero and the sum of all the cycle subproblem energies 
is zero due to the constraint A* £ fl- Notice that if there is a negative valued 
parameter <j)* e c on the c'th cycle sub-problem then any other edge / 7^ e must 
have a parameter setting such that <t>* e c + <fij c > 0. Otherwise, the configuration 
which cuts e and / would have negative energy. In particular, this implies that 
each cycle subproblem can only have a single negative parameter. 

Lemma 2: For each edge e with (6 e — A* < 0) contained in a cycle c, either 
(C c <0)or(3/ s.t. ct>% c + (j>f = Q). 

Suppose there existed an edge e and cycle c for which the implication of 
the lemma is false, that is (9 e - A* < 0), {cj>* e c > 0) and (V/ : <p* c + <j>* f c > 0). 
This would mean there is no minimizing configuration of the cycle subproblem 
that includes edge e (such a cut would have positive weight). However, edge e is 
necessarily cut in the single edge problem. In such a case the lower-bound could 
be tightened by the following update: 

/ = argmim>: c + <^ c ). (25) 
V = mm[-(d e ~X* e ) l( f ) : c + r f c } 

4>* e c ^= (j)* e c — v 

This update would drive up the energy of the single edge problem thus increas- 
ing the lower-bound by a positive quantity V. Since the lower-bound is tight by 
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assumption, such an edge e and cycle c must not exist. 

Modifying A to satisfy the constraint: We now describe an iterative proce- 
dure that starts with A* and </>* and produces a modified A + and (f> + obeying the 
constraint: 

(fl e - A+ < 0) ^ (0+ c < 0) V[c,e] (26) 

The lower-bound corresponding to A + will have the same value as that of A* and 
will satisfy the additional upper-bound constraint as desired. 

For each (e, c) such that (6 e — A* < 0) and (</>* c > 0), Lemma 2 establishes 
that there exists an edge / so that (0* c + (f>y = 0). Choose one such edge / and 
apply the parameter updates: 

V <= max[# e - A*, —<p* e c ] (27) 
<f>? <= <j>* e ° + V 

K^K + v 
x* f ^\}-v 

Repeatedly apply these updates until there exist no (e, c) such that (8 e — A* < 0) 
and (</>* c > 0). These updates do not change the minimizing configuration or en- 
ergy of either the cycle or edge subproblems. They also respect the lower-bound 
constraint 9 e < A e . Thus the bound remains constant. The final results of this 
procedure are denoted A + and (f> + . 

Lemma 3: For all edges e, (0 e - A+ < 0) -> (A+ < 0) 

The algorithm terminates when (8 e — A+ < 0) — > ((f>+ c < 0) for all cycles c 
and edges e. Since <j> + is a reparameterization of A + we have that ^ c 0+ c = A+ 
for each edge e which establishes the lemma. 

Claim: For all edges e, A+ < max{0,# e } 

If (8 e — A+ = 0) then the claim is satisfied. If (0 e — A+ < 0) then by Lemma 
3 we have (A+ < 0). For such an edge, it must be that 9 e < as we can't 
simultaneously have (0 e — A+ < 0), A+ < and 8 e > 0. Thus, we can transform 
any optimizer A* into an optimizer A + that achieves the same lower-bound and 
satisfies the additional constraints. 



